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■ Abstract 

■ Let A be a sequence of n > real numbers. A subsequence of A is a sequence of contiguous 
elements of A. A maximum scoring subsequence of A is a subsequence with largest sum of its 
elements , which can be found in 0{n) time by Kadane's dynamic programming algorithm. We 
consider in this paper two problems involving maximal scoring subsequences of a sequence. Both 
of these problems arise in the context of buffer memory minimization in computer networks. The 
first one, which is called INSERTION IN A SEQUENCE WITH SCORES (ISS), consists in inserting a 
given real number x in A in such a way to minimize the sum of a maximum scoring subsequence of 

c/3 , the resulting sequence, which can be easily done in 0{n^) time by successively applying Kadane's 

^ ; algorithm to compute the maximum scoring subsequence of the resulting sequence corresponding 

to each possible insertion position for x. We show in this paper that the ISS problem can be solved 
in linear time and space with a more specialized algorithm. The second problem we consider in this 
I paper is the SORTING A SEQUENCE BY Scores (SSS) one, stated as follows: find a permutation 

■ A' of A that minimizes the sum of a maximum scoring subsequence. We show that the SSS problem 
, is strongly NP-Hard and give a 2-approximation algorithm for it. 

in 

2 ■ 1 Introduction 
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Let the elements of a sequence A of « > real numbers be denoted hy a\, a2, ■ ■ ■ , Un- Then, A is the 
sequence {a\,a2, ■ ■ ■ ,a„) (which is () if n = 0) and its size is \A\ = «. A subsequence of A defined by 
indices < / < j < n is denoted by A\, which equals either (), if / = j, or the sequence (a,+i, . . . ,aj) 



\^ I of contiguous elements of A, otherwise (see Figure[T]for an example). Let score{Aj) = L;(=i+i ^/t stand 

^ ■ for the sum of elements of a/ (we consider score {{)) = 0). A maximum scoring subsequence of A is a 

subsequence with largest score. The Maximum Scoring Subsequence (MSS) problem is that of 
finding a maximum scoring subsequence of a given sequence A. The MSS problem can be solved in 0{n) 
time by Kadane's dynamic programming algorithm, whose essence is to consider A as a concatenation 
(Aq ,Aj^^j^ ,A-jl) of appropriate subsequences, called intervals, and to determine Sk as a maximum 

scoring subsequence of A/*, for all k € {1,2, . . . ,£}. Defining each interval A/* - with the possible 

exception of the last one - to be such that score{Af^) < and score{A-j^ ) > 0, for all 4 < / < then the 
largest score subsequence among {S\,S2, ■ ■ ■ ,Si} is a maximum scoring subsequence of A lUlEl- The 
value of A is score* (A) = score (S), for any maximum scoring subsequence S of A. 



*This work is partially supported by FUNCAP/INRIA (Ceara State, Brazil/France) and CNPq (Brazil) research projects. A 
slightly different version of this paper has been submitted for journal publication. 

^ Partially supported by a doctoral scholarship of CAPES (Programa de Demanda Social), 
"'"http: //www. lia.uf c .br/~pargo 
^Partially supported by a FUNCAP grant. 
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Figure 1: An example of a sequence and a subsequence. A maximum scoring subsequence is A {3 and 
score* (A) = 12. 



The MSS problem has several applications in practice, where maximum scoring subsequences cor- 
respond to various structures of interest. For instance, in Computational Biology, in the context of 
certain amino acid scoring schemes and several other applications mentioned in |[3ll3- In such a con- 
text, it may also be useful to find not only one but a maximal set of non-overlapping maximum scoring 
subsequences of a given sequence A. This can be formalized as the All Maximal Scoring Sub- 
sequences problem, for which have been devised a linear sequential algorithm |j4l, a PRAM EREW 
work-optimal algorithm that runs in 0{\ogn) time and makes 0{n) operations 151 and a BSP/CGM par- 
allel algorithm which uses p processors and takes 0{\A\/p) time and space per processor 0. The MSS 
problem has also been generalized in the direction of finding a list of k (possibly overlapping) maximum 
scoring subsequences of a given sequence A. This is known as the k MAXIMUM SUMS Q and for a 
generalization of it an optimal 0{n + k) time and 0{k) space algorithm has been devised HI 13. An 
optimal 0{n ■ max{l,log(^/«)}) algorithm has also been developed for the related problem of selecting 
the subsequence with the ^-th largest score lH- 

Sequences of numbers can also model buffer memory usage in a node of a computer network. In this 
case, the absolute value of a number models the local memory space required to store a corresponding 
message after its reception and before it is resent through the network (in practice, there are additional 
cases in which the message is produced or consumed locally; these situations are ignored in this high 
level description for the sake of simplicity of exposition). This behavior can be described more generally 
as the execution of tasks (sending or receiving messages), each of which is associated with a (positive 
or negative) cost that corresponds to the additional units of resources (local memory space) that are 
occupied after its execution. Receiving a message results in a positive cost, while sending a message 
can be viewed as effecting a negative cost. In this context, finding maximum scoring subsequences of 
sequences defining communications between the nodes of a network corresponds to finding the greatest 
buffer usage in each node [lOi]. Moreover, when the intention is to find an ordering for these commu- 
nications with the aim of minimizing the resulting memory usage, then we are left with the problem of 
sorting the communications so as to minimize the maximum renewal cumulative cost. 

We consider in this paper two problems related to the MSS. The first one, which is called INSERTION 
IN A Sequence with scores (ISS), consists in inserting a given real number ;c in A in such a way to 
minimize the maximum score of a subsequence of the resulting sequence. The operation of inserting x 
in A is associated with an insertion index p € {0, . . . ,«} and the resulting sequence A^p^ = (AQ,;c,Ap, 
that is, the sequence obtained by the concatenation of Aq, x, and A^. The objective of the ISS problem is 
to determine an insertion index p* that minimizes score* (A(^*)), which can be easily done in 0{n^) time 
and 0{n) space by successively using Kadane's algorithm to compute the maximum scoring subsequence 
of A(o), . . . , A("). We show in this paper that we can do better. More precisely, we show that the ISS 
problem can be solved in linear time. 

The ISS problem can be approached more specifically depending on the value ofx. The case x = 
is trivial since score* {A^^^) = score* {A) independently of the value of p, which means that any insertion 
index p is optimal for A. If x < 0, then score{A'^P'') < score{A), for all insertion indices p G {0,1, . . . ,n}. 
Intuitively, then, x has to be inserted inside some maximum scoring subsequence 5 = A/ of A, in an 
attempt to reduce the value of A^p^ with respect to that of A. Even though the value of A^f^ cannot be 
smaller than score* (A) in certain cases (for instance, if S has only one positive element, or score* (Ag) = 
score{S), or score* (A'j) = score{S), then all insertion indices ai^e equally good for A since score* {A^^') = 
score* {A) for any particular choice of p), we describe an 0{n) time and space algorithm to determine a 
best insertion position in a maximum scoring subsequence of A, provided that x is negative. 
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Showing that the ISS problem can be solved in linear time is a more complex task when x > 0. 
Inserting x inside a maximum scoring subsequence 5 of A will certainly lead to a subsequence S' of a'^^ 
such that score{S') > score{S) (this may happen even if x in inserted outside S). Intuitively, therefore, 
we should choose an insertion position where x can only "contribute" to subsequences whose scores ai^e 
as small as possible. Computing the necessary information for this in 0{n) time may seem involving at 
first, but we can make things simpler by considering the partition into intervals of A (the same used in 
Kadane's algorithm). The idea is to determine the interval A/* having an optimal insertion index. The 
difficulty to accomplish this task in linear time stems from the fact that computing score* (A(^)) when p 
is an insertion index in an interval Af^ may involve one or more intervals other than A*. We overcome 
this difficulty by means of a dynamic programming approach. 

The second problem we consider in this paper is the SORTING A SEQUENCE BY SCORES (SSS), 
stated as follows: given the sequence A, find a permutation A' of A that minimizes score* (A'). The SSS 
problem is the particular- case of the SEQUENCING TO Minimize the Maximum Renewal Cumu- 
lative Cost problem for which the partial order is empty ifTTI . It is mentioned in ifTTTl that the SSS 
problem has been proved to be strongly NP-hard by means of a transformation from the 3 -PARTITION 
problem. Indeed, a straightforward reduction from 3-PARTlTlON yields that the SSS problem remains 
NP-hard in the strong sense even if all negative elements in A are equal to —s and every positive element 
Qi is such that s /A < ai < s /2, for some number s (more details are given in Section [S]). On the other 
hand, it is known that the SSS problem becomes polynomially solvable if all positive elements are equal 
to some number s' independent from s IfTTI . 

Usually, sorting problems are closely related to insertion ones in the sense that one could expect 
that an appropriate sequence of insertions would produce a good sorting. In this sense, ISS and SSS 
problems can be related as follows. Let MISS be the problem of, given sequences A = {ai,...,a„) 
and X = {xi, . . . ,Xk), k > I, finding a sequence A' which results from an insertion of the elements of X 
into A and which minimizes score* {A'). Note that, since finding an optimal insertion of the elements of 
a sequence X into an empty sequence A implies finding an optimal permutation of X, then the MISS 
problem is NP-hard. Nevertheless, the following recursive algorithm for the MISS problem, which 
we call MULTIINSERT, can be turned into an exact algorithm for the SSS problem: if ^ = 0, then 
return A; if k = I, then return the sequence resulting from the insertion of xi in A; otherwise, compute 
the sequence B,, for each / G {0, . . . ,n}, which results from the insertion of xi in position / of A, as 
well as sequence Q = MULTIINSERT (B,-, (x2, . . . ,Xyt)), and then return Q for some / which minimizes 
score* (Ci). A consequence of our linear algorithm for the ISS problem is then that MULTIINSERT can 
be made to run in 0{f{n,k)) time, where f{n,0) = 1, f{n, I) =n and f{n,k) = (n + 1) ■ f{n + \ ,k—\) = 
0{{{n + k) \ — {n + k— \)\) /n\) if k> 2. Whether or not there are faster algorithms for the MISS problem 
is a subject for further investigations. 

We present in this paper a (1 + M/ score* (A)) -approximation algorithm for the SSS problem, where 
M is the maximum element in A, which runs in 0{n\ogn) time. For the general case of the SSS problem, 
since score* {A) > M, this algorithm has approximation factor of 2, and we show that this factor is tight. 
However, for a more particular case, still strongly NP-hard, where the elements of A are bounded, 
from below and above, by linear functions of score (A), the approximation factor of this same algorithm 
becomes 3{n + l)/2n, for n>3. 

We organize the remaining of the text as follows. Section |2] states some useful properties of maxi- 
mum score subsequences for later use. In Section [3] and Section |4] we then present our solutions to the 
ISS problem for the cases where the inserted number x is negative and positive, respectively. Section [5] 
contains our results on the SSS problem, and Section [6] finally provides conclusions and directions for 
further investigations. 

2 Preliminaries on the ISS problem 

Let us establish some simple and useful properties of sequence A and a subsequence Aj, for <i < j <n. 
We start with three properties that give a view of minimal (with respect to inclusion) maximum scoring 
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subsequences. Let a prefix (suffix) of a/ be a subsequence a/ (Aj), with / < / < j (i < i' < j). 

Fact 1. IfAj is a maximum scoring subsequence of A, then its prefixes and suffixes have all nonnegative 
scores, otherwise a larger scoring subsequence can be obtained by deleting a prefix or a suffix of negative 
score. Conversely, score (X) < 0, where X is any suffix o/Aq or prefix of A"-, otherwise a larger scoring 

subsequence can be obtained by concatenating A^j with a suffix of A^ or prefix of A" of positive score. 

Fact 2. If A\ is a maximum scoring subsequence of A, then there is a maximum scoring subsequence of 
A\ which is a prefix (suffix) of A\. 
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Figure 2: Partition into intervals of the sequence in Figure [T] For each interval, the score of its prefixes 
is indicated, as well as its maximum scoring subsequence. 

The definitions in the sequel are illustrated in Figure |2l The subsequence A/ is an interval if 
score (A-j) < or j = n, and score (A ^ ) > 0, for all / < / < j. The partition into intervals of A is 
the concatenation (/i = Aq ,/2 = Af^^j^ ,---Je = A/') of the £ maximal intervals of A. Such a parti- 
tion is explored in Kadane's algorithm due to the fact that a maximum scoring subsequence of A is a 
subsequence of some of its intervals. 

Fact 3. IfAj is a maximum scoring subsequence of interval Ik and Aj is a prefix (suffix) of A^ such that 
score (A -I ) = 0, then A - \A .i is a maximum scoring subsequence of Ik. 

While the previous properties are general for every sequence, the next one is more specific to the 
resulting sequence of an insertion. Recall that x stands for the real number given as input to the ISS 
problem. Assume that the insertion index p is such that ik<p< jk, which means that x is inserted in 4. 

Fact 4. The score of all elements of Ik whose indices are greater than p are affected by the insertion of 
X in the following way: for every p < q < jk + 1, score{A^P^'l^) = score{Al^ +x. 

This fact is the reason why the discussion of cases x < and ;c > is caiTied out sepai^ately in 
the two next sections. For the positive case, since all prefixes of 4 have nonnegative scores (Fact [T]), 
consecutive intervals may be merged in the resulting sequence, provided that x is large enough to make 
score{A^P^f^^ ) > 0. For instance, consider interval 4 in Figure|2] The insertion of x = 6 at the very end 
of this interval (i.e, at insertion position p = ji — I = 5) creates the subsequence (A? ^q,6, —4) and the 
new interval (Aq,6, —4, 4, /a). On the other hand, for the negative case, the insertion of x may split 4 
into two or more intervals if there exists p <q < jk such that score {A^p^IJ < 0, in which case A'^'^f^ is 

an interval of A^^^ but A^^^;^' is not. Again in Figure |2j the insertion of x = — 6 between the elements -2 
and 5 of interval I4 splits it into 3 intervals, namely (2,4, —2, —6), (5,3,0, —6, —4), and (3,2, —4, —6). 
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3 Inserting x < 



As already mentioned in the Introduction, solving the ISS problem when x < coiTesponds to insert x 
in some maximum scoring subsequence A^-. According to Fact[3j we assume that A\ is minimal with 
respect to inclusion. What remains to be specified is the way to find an appropriate insertion index in 
A\. The algorithm in the sequel is based on the fact that inserting x inside A- divides the latter in its left 
(a prefix of A/) and right (a suffix of A\) parts, and different choices of p may lead to different values of 
A^P\ as depicted in Figure [3] One solution is then to simply try out each possibility for p in the range 
from / + 1 to 7 — 1, computing the maximum between the values of the resulting left and right parts of 
A^-. Doing this straightforwardly takes @{rp-) time, since we would run Kadane's algorithm twice for 
each possible value of p (once for the left and once more for the right part of A/). Fortunately, this can 
also be easily done in 0{n) time, since a left-to-right traversal of A\ can be used to compute (and store) 
the values of all possible left parts of A], and a further right-to-left traversal can be used to compute the 
values of all possible right parts of This strategy is materialized in Algorithm [TJ which employs a 
version of Kadane's algorithm as a sub-routine returning the indices / and j and the score of the minimal 
maximum scoring subsequence considered. 



14 15 16 17 18 19 20 21 22 23 24 25 
2 4 -2 5 3 I -6 -4 3 2 -4 -6 




(a) Interval. 




(b) p = 17. (c) P = 16. 




(d)p=15. (e)p=14. 
Figure 3: Possible insertion positions in the interval I4 of the example in Figure[2]for x = —4. 

Three main variables are used in the algorithm, with the following interpretation. Variable SX plays 
different roles in loops at lines [6] and [12] In the first case, it stores the prefix sums of Aj, while in the 
second, suffix sums. Array L and variable R store the maximum scores of prefixes and suffixes of Aj, 
respectively. 

Lemma 1. Algorithm \1} is correct, i.e. InsERTIONOfNegATIVE(A,x) returns an optimal insertion 
index p, provided that x <0. In addition, it runs in 0{n) time and space. 

Proof. The trivial cases n = 0, j <i+l, and score (Af) = are properly handled at line [3] Then, assume 
that n>0, j > i+\, and score (Aj) > 0. 

Let p G {/ + 1, . . . ,7 — 1} be the value returned by the algorithm and p' / p be another arbitrary 
insertion index. We show that score*{A^''^) < score* (A^^^). Let in addition T be a maximum scoring 
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Algorithm 1: InsertionOfNegative 



Input: an array A of « > real numbers and a real number ;c < 
Output: an optimal insertion index for A 

1 {ij,s) ^ Kadane(A) 

2 

3 if j<i+l or s = then return / 

4 SX at^i 

5 L[l] ^ SX 

6 for 2, . . . ,7 — / do // Largest scores of prefixes of A^j 

7 SX^SX + ai+k 

8 L[k]^max{L[k-\],SX} 

9 m ^ L[j — i] 
10 SX^O 

u R^O 

12 for ^ 7 — ,2 do // Largest scores of suffixes of Aj 

13 
14 
15 
16 
17 



SX^SX + ai+k 
R^ma\{R,SX} 
if max{L[/:— 1],/?} < m then 
m max{L[k— l],R} 
p-^ i + k— I 



18 return p 



subsequence of A^^^, minimal with respect to inclusion. Note that T ^ {) since score* (A) = score{Aj) > 
0. Moreover, by Fact[T] x is neither the first nor the last element of T. So, let y and z be such that 
Tq = (ay+i ) and Ti^/jLi = i'^z)- The first case to be analyzed is when x is in T, i.e. y < p <z (Figure ^fa)] ). 
In this case, by Fact[T]and the minimality of Aj and T,y = i and 2 = 7 or, in other words, T = {Af ,x,Ap). 
The elements of A/ also form, perhaps with the occurrence of x at some position, a subsequence T' 
of A^P\ and since a: < 0, we conclude that score{T') > score{T) (equaUty holds if y < p' < z). Then 
score* {A^P'') = score{T) < score{T') < score* {A^p'^), as claimed. 

Assume that p ^ {y, . . . ,z}- If T's elements also form a subsequence of A^p ^ (more precisely, p' ^ 
{y + \, . . . ,z — 1}), then score*{A^P^) = score{T) < score* {A'^p '>), as desired. Then, assume that p' G 
{j + 1, . . . ,z — 1}. If a/ and T are disjoint, then A\ is also a subsequence of A^^' It turns out that 
score* {A^P'^) < score*{A) = score{Aj) yields score* {A^^'^) = score{Aj) = score*{A) > score*{A^P^). 

Finally, we are left with the case when A/ and T are not disjoint (Figure | ^b)[ ), which requires 
a more detailed analysis of Algorithm [T] By Fact [T] and the minimality of T, either y = / or z = j. 
Without loss of generality, let us suppose the first equality, since the other one is analogous. The val- 
ues computed in the loops of lines [81 and [14] for each L[k — 1] and R coiTcspond to score* (A'^^'^^^) 
and score* {Aj_^_i^_y), respectively, by the property established in Fact|2] Due to line [151 we have that 
max{score* {Af ), score* (A-'p,)} > score*{Af) = scoreiT). The result follows since both Af and A^, are 
subsequences of A^^ '. 

The complexities stem directly from the facts that the algorithm employs one anay of size 0{n) and 
performs two disjoint 0(«)-time loops. □ 
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(a) X is in r in There are three possible situa- 
tions for A'^ ) as indicated. 
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(b)p^ e {y+l,...,z-l}, andA/ and 

T are not disjoint. 



Figure 4: Cases of proof of Lemma [T] 



4 Inserting x > 

The discussion in this section is based on the partition into intervals (/i,/2, . . . ,h) of A. For the sake 
of convenience, we assume that a„ = (observe that this can be done without loss of generality since 
appending a new null element to A does not alter the scores of the suffixes of A), which means that 
score {h) > 0. A particularity of this positive case, which is derived from Fact HI is the following: for 
every interval 4, index 4 — 1 is at least as good as any other insertion index in this interval. Thus, an 
optimal insertion index exists among 77 — 1 ,72 — 1, . . . Jn — 1, corresponding each one of these indices to 
one interval of the partition into intervals of A. If p =jk — I is chosen as the insertion index, then the 
resulting interval in A^p^ (which may correspond to a merge of several contiguous intervals of A in the 
sense of Fact |4l) is refeiTcd as an extended interval, relative to 4 and denoted by /(*^) . If 4-/ is one of 
the intervals which are merged to produce I^'^^ , then 4' is a subinterval of Z^*^' . In the remaining of this 
section, we show a linear time algorithm to compute score* (/W), for all G {1,2, . . . Clearly, the 
smallest of these values is associated with the optimal insertion index for x. 

For each k, computing score* {I^'^^ ) by means of Kadane's algorithm takes @{n) time. Therefore, 
the exhaustive search takes quadratic time in the worst case. However, as depicted in Figure |5j by 
graphically aligning the scores of the prefixes of the extended intervals with respect to the intervals of 
A, one can visuahze some useful observations in connection with these curves which are explored in the 
algorithm described in the sequel. Let the sequence of negative elements composed by intervals' scores 
be denoted by A'^ = {score{Ii),score{l2), ■ ■ ■ , score {!()). 

Observation 1. Let a G 4' be the element of indices j in I^^^ and f in ly, k' >k+\ (an assumption that 
is tacitly made here is that Ij,' is a subinterval ofl^^^ ). Then, 

score {I'^'''>q) = score{Al'^^'^)+x + aj,+score{A'jl'^-'') 
= x + scoreiA^^'-' ) + score{{Ik')i) 
= x + score{N'l^Zi)+score{{lk')i) 

As an example, take a = 4, I^^^ = and 4' = h in Figure^ The equality above indicates the distance 
of 1 between the curves ofl^^^ and Ik' for the element 4 € I4. 

A first consequence of Observation [T] is a recurrence relation which is used to govern our dynamic 
programming algorithm. If ^ < £, let 7^^' ("1/'-'^+'^ stand for the concatenation of the common subintervals 
of /(^) and (for the sake of illustration, n/^^' = (4, 4, 4) in the example of Figure |5]l. In 

addition, write 4' ^ /(^' n/^'''+^' to say that interval 4-/ is a common subinterval of /f'^' and /(*^+'). The 
recurrence for score* (I^'^^) is given by 

score*{I^''^) = max{score* (Ik) ,x + score{A^i^^^^)} , (1) 

ifk = i (considering that the last element of A is null) or (^ < £ and /('■'^ n/('"'+^' = 0) or, otherwise, 

max\ score* (Ik),x + score (Af ^ M , :t + max {score (N'^S, ^ ) + score* (4-/ ) 1 1 • (2) 
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The first two terms in ([T|l and Q indicate the best insertion index in 4, while the third one in Q gives the 
best interval in Z^*^' n/'^+^^ (if any). The crucial point is then the computation of max^^^(^^(*)p^(*r+i){5'core( 

N^Zy )+ score* {ly)} when 4+i is a subinterval of I^''^ (i.e. I^'^^ / 0), which is performed in the 

light of the following additional observations. 

Observation 2. Let a € l''^ ■ be the element of indices j and f in, respectively, fi^ and \ k' >k+\. 
Write Ik" for the interval containing a, and f for the index of a in 4". Assuming that k" ^ k', then 

score{l^%) - score{I^^">i) = scoreiN^Ji^) + score{{Ik"){) - score{N^,'Zi) - score{{Ik"){) 

= score{N^'Si) 

Thus, the respective curves of fi^ and 7^*^ ^ remain at a constant distance for all intervals 4' C fi^ n 
k' i^k+l, with the curve ofl^^^ above that ofI^^\ 

The last observation before going into the details of the algorithm is useful to decide whether a given 
interval 4' is a subinterval of /(*^) . 

Observation 3. Observation [7] implies that if interval I^i, k' > k + I, is contained in then x + 
score{N'^Z\) > 0. The converse is also true since x + score{Nl_Z\) >0 yields x + score (N^S^^) >0,for 
all k <k" < k', because all members ofN are negative. 
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Figure 5: Scores of prefixes of all possible extended intervals resulting from the insertion of ;c = 9 in the 
sequence in Figure [T] For each interval 4, the points coiTcsponding to score* (Ik) and score* {I^'^^) are 
highlighted. The last null element of the sequence is omitted. 

The computation of the largest scores of prefixes of extended intervals I^^^ is divided into two phases. 
The first phase is a modification of the Kadane's algorithm and its role is twofold. It first determines the 
largest scores of prefixes of 4,4, ■ • ■ ,4 and, then, it sets the initial values of the arrays that are used in 
the second phase. Such arrays are the following: 

SN suffix sums of A'^, i.e. SN[k] equals score{Nl_^), for all ke{l,2,... ,£}. By definition, score{N^Zi) = 
SN[k]- SN[k'], for: al\k' >k. 

INTSCR largest intervals' scores, i.e. INTSCR[k] = score*{Ik), for all k£{l,2,... ,£}. 

XSCR for each interval k € {1,2,. ..,£}, this array stores the score of the subsequence ending at x, 
provided that a; is inserted in 4, i-C- XSCR[k] = x + score^Af^^^). 

The second phase is devoted to the computation of the extended interval containing the best insertion 
position for x. This is done iteratively from k = I until k = i. For each k, the recurrence relation ([T]!-© 
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is used to start the computation of score* (/^^^ ) and to update the maximum score of extended intervals 
started in previous iterations as described in Algorithm [2l Such information is stored as follows. The 
array EXT SCR contains the maximum scores of prefixes of the extended intervals for all k' G 
{1,2, ... ,^}. The intervals with best prefix scores obtained so far are kept in the queue INTQ. Q is the 
real- of the queue INTQ, initialized at 0. 



Algorithm 2: Second phase for the case x > 



Input: Arrays SN, INT SCR, and XSCR computed in the first phase 
Output: An optimal insertion interval for A 

k ^ 1 

msix{INTSCR [k] , XSCR [k] } 



2 EXTSCR[k] 

3 2^1 

4 INTQIQ] ^ 

5 for ^ ^ 2, . 
DIST i- 



6 
7 
8 
9 
10 
11 

12 
13 
14 
15 



x + SN[INTQ[Q]]-SN[k] 
while DIST > and DIST + INT SCR[k] > EXT SCR[INT Q[Q]] do 
EX TSCR [INTQ[Q]]^DIST + 1 NT SCR [k] 
if 2 > 1 and EXTSCR[INTQ[Q]] > EXTSCR[INTQ[Q- I]] then 

DIST + SN[INTQ[Q]] - SN[k] 

EXTSCR[k] ^ max{INTSCR[k],XSCR[k]} 
if EXT SCR[k] < EXTSCR[INTQ[Q]] then 

INTQ[Q] ^ k 



16 return INT Q[Q] 



The correctness of the two-phase algorithm stems from the following lemma. 

Lemma 2. For every iteration k (just before execution of line\5\of Algorithm^, let Ij^i be an interval 
and k" = INTQ[Q]. Then, the following conditions hold: 

1. EXTSCR[k"]= score* {I^''"^\Aj-J; 

2. ifQ>\ and k' appears in INTQ but k" / k', then k' < k" and score* {I^''"^ < score* (I^'"'^ \ 
Af); and 

3. ifk' < k does not appear in INTQ, then k" is such that score* [I^^"^ ) - score* {I^^'^ 

Proof. By induction on k. For ^ = 1, condition [J holds trivially due to line[2j while conditions [2] and [3] 
hold by vacuity. Let ^ > 1. We need to analyze the changes in INTQ. We start with the intervals that 
ai^e removed from INTQ. At line|6j Observation [T]is used to compute the distance between the curves of 
I^^ ) and 4. If this distance is negative, then 4 is not a subinterval of \ Otherwise, condition [T] of the 
induction hypothesis is used in the comparison of line |7] and EXTSCR[k"] is updated at line [8] according 
to Q using Observation [T] So, condition [T] remains valid for k up to this point of the execution. If 
EXTSCR[k"] increases (i.e. line [8] is executed), then Observation |2] and condition |2] of the induction 
hypothesis are evocated to remove I^^ ^ from the queue respecting condition [3]in case a point of 4 in the 
curve of /(^^"^ overcomes that of an interval that preceeds 4"- EXTSCR[k"] is updated again according 
to (O in order to satisfy condition [T] This procedure is repeated until condition |2]is valid for the intervals 
still in INTQ. 

Finally, lines [T2] - [T5] correspond to the insertion in INTQ. The maximum score of the prefix of Z^*^' 
containing 4 and x only is updated at line [12] and /'-'^^ enters the queue only if such maximum score is 
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below the maximum score of the prefix of /(^^^2[2l) considered so far. This implies that conditions |2] 
and [3] are also valid for k. □ 



Theorem 1. The ISS problem can be solved in 0{n) time and space. 

5 Sorting 

We now turn our attention to the SSS problem. Its hardness is analized considering the following derived 
problem. 

Restricted version of the SSS problem: given two positive integers k and s, we denote by SSS(^,5)the 
restricted version of the SSS problem where n = 3k, the elements in A are integers and bounded by a 
polynomial function of k, all negative elements are equal to —s, and every positive element a, is such 
that s/4 < ai < s/2. 

A consequence of the fact that sorting a sequence is similar to accommodate the positive elements 
in order to create an appropriate partition into intervals leads to the following result. 

Theorem 2. The SSS{k,s) problem is strongly NP-hard. 

Proof. By reduction from the 3-PARTlTlON decision problem, stated as follows: given 3k positive 
integers ai,...,ai,k, all polynomially bounded in k, and a threshold s such that s/4 < a,- < s/2 and 
T]=i'^i = there exist k disjoint triples of ai to ai,k such that each triple sums up to exactly 5? 3- 
Partition problem is known to be NP-complete in the strong sense |[T2l . 

Given an instance C of the 3-PARTlTlON problem, an instance of the SSS{k,s) problem is defined 
by an arbitrary permutation A of the multiset C' obtained from C by the inclusion of k occurences of —s. 
A solution for the SSS instance is to choose elements of C for each negative element of C', which gives 
a partition of C. Since a,- > s/4, for all / G {1, • • • ,3^}, every sequence of 4 positive elements chosen 
from C' has value greater than s. Thus, C is a "yes" instance of the 3-PARTlTlON problem if and only if 
score* {A') = s. □ 

We show in the sequel that Algorithm |3] is a parametrized approximation algorithm for the SSS 
problem. Such an algorithm builds a permutation of A keeping the maximum scoring subsequence of all 
intervals, except the last one, bounded by the input parameter plus the lai^gest element of A. For the last 
interval, the following holds for every sequence A. 

Observation 4.IfN= {score(Ii),score{l2), ■ ■ ■ ,score{l£)) is the sequence of negative elements com- 
posed by intervals' scores, then score{Nf_i) = score{A) — score^N^^^). Considering that li is a subse- 
quence of A and that score{NQ^^) < 0, we conclude that score{N[_^) is a lower bound for score* {A) at 
least as good as score{A). 

Algorithm [3]gets as input, in addition to the instance A (with size n), the parameter L, which depends 
on M = maXaeA A variable S is used to keep the score of the interval being currently constructed. Just 
after step[TO]is executed, it turns out that L+M >S>L. On the other hand, execution of step[T5]leads to 
5 < L or includes all remaining negative elements in A'. After that, if S -\- score{Q) -\-score{R) < 0, then 
a new interval 4 is estabUshed and S is incremented by score{N'\_i). A straightforward consequence is 
that score* {A') > L + M only if step [T6] is executed with positive elements of A, and this due to the last 
interval (in the sense of Observation |4l). This leads to the following result. 

Lemma 3. Let A be an instance of the SSS problem, A' be the sequence returned by the call Parame- 
TRIZEDSorting(A,L), for some L>M, and N' be the sequence of the scores of the £' intervals of A'. 
Then, 

score*{A') < max{0, L + M, score {N'^i,_i) = score{A) - score {N'q^^)}. (3) 
Moreover, Parametrized SORTING (A, L) runs in 0{n) time. 
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Algorithm 3: ParametrizedSorting(A,L) 



Input: an array A of « > numbers and a parameter L>M 
Output: an array A' containing a permutation of A 

1 Let A' be an array of size n and 

2 Let C A and A+ C A be sequences of negative and positive members of A, respectively 

3 7^1 

4 5^0 

5 while A" / and A+ 7^ do 

Let Q be a sequence of elements of A+ such that L < S + score{Q) < L + M, if one exists, or 
Q = A+ otherwise 

Assign the elements of 2 to A ' [ j . . . 7 + 1 2 1 — 1 ] 



7 

8 
9 
10 
11 



A+ ^A+\2 

Let /? be a minimal sequence of elements of A^ such that S + score{Q) + score{R) < L, if one 
exists, or Q = A^ otherwise 

12 S ^mdix{Q,S + score{Q)+score{R)} 

13 Assign the elements of R to A'[j . . . 7 + |/?| — 1] 

14 i^i + \R\ 

15 |_A"^A"\/? 

16 Assign the elements of A" UA+ to A'[7' . . . |A" UA+| — 1] 

17 return A' 



The key of our approximation algorithm is to provide Algorithm ParametrizedSorting with an 
appropriate lower bound parameter. The most immediate one is L = max{0,M,score{A) —M}, which, 
however, does not capture the contribution of the negative members of A whose values are smaller than 
—L when A contains at least one nonnegative element. In order to circumvent this difficult case of 
Lemma[3j assume that A* is an optimum solution and OPT = score* (A*). According to Q, we need to 
find a new value for L such that score{N'[,_i) <L< OPT, being Ip the last interval of the sequence A' 
returned by ParametrizedSorting(A,L), with the purpose of having score*{A') < 20PT. 

We first argue that we can assume that the last element a* of A* is such that a* > —OPT: if M > 
and a* < —OPT, then construct optimum solution from A* by moving a* to the first position. Since 
the first interval of the new sequence contains a* only, the scores of the remaining subsequences are 
not affected. Hence, the new sequence is still optimum. If necessary, repeat this operation until the last 
element is at least as lai^ge as —OPT. The optimum sequence so obtained is such that if t? < — L, then 
a is not in the last interval. It follows that either msix{0,M,score{A) —M} = or, by Observation IH a 
strengthened lower bound L can be chosen satisfying the inequality 



where Bl = {a,- e A | a,- < — L}. 
Writing the inequality for L as 



> score{A) + I«,eB^ (-a; - L) 
= score{A\BL) — L\Bl\, 



^^score{A\B,) 
~ \Bl\ + 1 



we define the two-phase Algorithm ApproxSorting(A). Its first phase consists in determining the 
smallest L satisfying Q and the conditions of Lemma |3] To do so, set L = max{0,M,5'core(A) — M} 
and take the elements of a decreasing sequence P on the set {—a | a € A, a < — L} (note that, by definition, 
all elements of P are distinct). If L > and P ^ {), then write this sequence as P = {pi,p2,... ,P\p\), find 
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the maximal index k (in the range from 1 to |P|) such that pk > score{A \ Bpi,)/k, and set L = score{A \ 
Bpi^)/k. The second phase is simply a call ParametrizedSorting(A,L) to produce a permutation A' 
of A. Since each interval of A' having score{I[) < has ana £ Bl as last element, we get score{l[) > 
a + L. Observation m leads to score{N'\i_i) < L. Therefore, Lemma[3]gives the following result. 

Theorem 3. ApproxSorting is a 2 -approximation algorithm for the SSS problem and a {?>k + I) /Ik- 
approximation algorithm for the SSS{k,s) problem which runs in 0{n\ogn) time. 

Proof. The approximation factors stem directly from Lemma [3] and M < score* (A). In special, for the 
SSS{k,s) case, inequality |4]gives L = ks/ {k + 1). Considering that s/2 is an upper bound for all positive 
elements, we get 1 +M /score* (A) < {3k + I) /2k, which leads to the claimed approximation factor for 
this case. □ 

A final remark that can be made in connection with algorithm AppROxSORTiNG for the SSS prob- 
lem is that the approximation factor of 2 is tight. To see this, consider x > and x/2 < y < x. The 
sequence A returned by the call ParametrizedSorting({j, —x,y, —x,x),x) is either {y,y, —x,x, —x), 
or {y,x,—x,y,—x), or {x,—x,y,y,—x). It follows that 2y < score* (A) <y + x. Then, since OPT = x, 
^'^"oPT^^ — )• 2 as — J — 7- 0. 



6 Concluding remarks 

We investigate two problems related to maximum scoring subsequences of a sequence, namely the IN- 
SERTION IN A Sequence with scores (ISS) and Sorting a sequence by scores (SSS) prob- 
lems. For the ISS problem, we provided a linear time solution, and for the SSS one we proved its 
NP-hardness and give a 2-approximation algorithm. An additional remark with regard to the ISS prob- 
lem is that our solution extends immediately to its circular version, where to the set of subsequences we 
add each (A",Aq) such that I < j < i < n. The core of this extension is to modify Kadane's algorithm 
so that it takes the new subsequences into account as well. This can be done by means of a preliminary 
0{n) step computing E[i] = maXi<k^n{score{A'^)} for each / € {1, . . . ,?i — 1}, so that E[j] + score{Af^) 
gives the maximum sum of a circular subsequence ending in aj, for each j G {I, . . . ,n — I}. We can 
then solve the circular version of the ISS problem as follows: if x < 0, we first find a (circular or not) 
maximum scoring subsequence 5 of A which is minimal in size, and then proceed as before to find an 
optimal insertion position inside of S. If x > 0, then we first use the extended version of Kadane's al- 
gorithm to find the partition of A into possibly circular intervals - the interval containing £?„ continues 
with elements ai,a2,... ,ak for some ^ € {0, — 1}; next, we build a circularly shifted permutation 
A' of A by moving elements a\,a2, ■ ■ ■ ,ak to the end of the sequence, and then apply the algorithm for 
the non-circular case of ISS to A', which clearly also gives a solution for A. 

The SSS problem is also closely related to another set paititioning problem, called MULTIPROCES- 
SOR Scheduling problem, stated as follows: given a multiset C of positive integers and a positive 
integer m, find a partition of C into m subsets Co,Ci, . . . ,C,„_i such that rnax,£{o.i,...,m-i}{Laec, '^1 
minimized. Not surprisingly, given an instance (C,ni) of MULTIPROCESSOR SCHEDULING, an instance 
of the SSS problem can be defined as an arbitrary permutation A of the multiset C' obtained from C by 
the inclusion of ni — 1 occuiTcnces of the negative integer — score (C) — 1, indicating that a solution of 
the SSS problem for A induces a solution of the MULTIPROCESSOR SCHEDULING problem for (C,m). 
This problem admits a polynomial time approximation scheme (PTAS) |[T3l[T4l as well as list scheduling 
heuristics producing a solution which is within a factor of 2 — l/?i (being n the number of elements in 
the input multiset C) from the optimal IBl. On the other hand, MAX-3-PART1T10N, the optimization 
version of the problem used in the proof of Theorem|2j is known to be in APX-hard |[T6l . A natural open 
question is, thus, whether there exist a polynomial time approximation algorithm with factor smaller 
than 2 for the SSS problem. In this regard, note that although transferring our approximation factor 
from the SSS problem to the MULTIPROCESSOR SCHEDULING problem is easy, the converse appears 
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harder to be done, since we do not know in advance how many intervals there should be in an optimal 
permutation A' of A. 

The SSS problem has the following generalization for multidimensional vectors of numbers, which 
arises in the context of buffer minimization in radio networks. Given a matrix M, let rowM{i) and coImU) 
denote respectively the /-th row and the 7-th column of M. The multidimensional version of the SSS 
problem is then that of, given a.k xn matrix M, with k,n > 0, finding a permutation M' of the columns 
of M which minimizes value {M') = score* {rowM' ('))• This problem is hard even for very restricted 
matrices. 

Theorem 4. Tlw multidimensional version of the SSS problem is NP-hard even ifM[i,j] € {— 1,+1} 
for all i and j. 

Proof. Consider the following polynomial-time reduction from the HAMILTONIAN PATH prob- 
lem, which is NP-complete |[T2l . Given an undirected graph G = {V,E), with V = {l,...,n}, let 
E = {S <^V : \S\ = 2 and S ^ E} = {{xi , ji }, . . . , {xk,yk}}. We assume that ^ > 1 because otherwise 
the HAMILTONIAN PATH problem is trivial. Define M as the ^ x n matrix such that M[i,j] = +1 if 
j G {xi,yi} andM[/,7] = — 1 otherwise. If there is a Hamiltonian pathP = ,£„) in G, then letM' be 

the permutation of the columns of M according to P. It turns out that score* {rowM' («)) ^ 1 foi' all /, since 
M'[i, j] = +1 for each Ij G {xi,yi}, which means that value{M') > k. However, it cannot be the case that 
score* {row M' {i)) > 1 for some /, since otherwise there is j such thatM'[/,7] =M'[iJ+ 1] = +1, which 
implies that G E and thus contradicts our assumption about P. Therefore value{M') = k. 

Conversely, if G has no Hamiltonian path, then, as in the previous case, score* {row m' {i)) > 1 for 
all /. However, since any permutation P = (^i,...,£„) is not a Hamiltonian path of G, then there 
are j and / such that {ij,£j^i} = {jc,-,j,} G E, which implies that score* {row M'{i)) > 2. Therefore 
value {M') > {k — I) -\-2 > k, completing the proof. □ 

Note, however, that the same is not true for the particular case ^ = 1 of the multidimensional SSS 
problem: the SSS problem is polynomially solvable to optimality when every element a,- G A is such that 
a,- G {j8 , 7l}, where [5, 71 are two real numbers 1 11 ]. We leave it as an open question whether or not there 
are constant factor approximation algorithms for the multidimensional version of the SSS problem in 
its general form. A final remark is that the ISS problem can be generalized in a similar way for the case 
of vectors of numbers: given a.k xn matrix M and a ^ x 1 column X, find a position p for inserting X 
into M that minimizes the value of the resulting matrix. Like the ISS problem, this one admits a trivial 
polynomial solution: checking all the n + l possibilities and choosing an optimal one takes 0{n^ -k). 
We wonder, however, whether more efficient algorithms exist. 
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